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Abstract 

We present a novel method in the family of particle MCMC methods 
that we refer to as particle Gibbs with ancestor sampling (PG-AS). Sim- 
ilarly to the existing PG with backward simulation (PG-BS) procedure, 
we use backward sampling to (considerably) improve the mixing of the 
PG kernel. Instead of using separate forward and backward sweeps as 
in PG-BS, however, we achieve the same effect in a single forward sweep. 
We apply the PG-AS framework to the challenging class of non-Markovian 
state-space models. We develop a truncation strategy of these models that 
is applicable in principle to any backward-simulation-based method, but 
which is particularly well suited to the PG-AS framework. In particular, 
as we show in a simulation study, PG-AS can yield an order-of-magnitude 
improved accuracy relative to PG-BS due to its robustness to the trun- 
cation error. Several application examples are discussed, including Rao- 
Blackwellized particle smoothing and inference in degenerate state-space 
models. This report is a slightly extended version of the paper [I]. 

1 Introduction 

State-space models (SSMs) are widely used to model time series and dynam- 
ical systems. The strong assumptions of linearity and Gaussianity that were 
originally invoked in state-space inference have been weakened by two decades 
of research on sequential Monte Carlo (SMC) and Markov chain Monte Carlo 
(MCMC). These Monte Carlo methods have not, however, led to substantial 
weakening of a further strong assumption, that of Markovianity. It remains a 



major challenge to develop inference algorithms for non-Markovian SSMs: 

xt+i ~ f(xt+i | 0,xx :t ), yt~9(yt\0,xi:t), (1) 

where 9 £ 6 is a static parameter with prior density p(9), x t is the latent 
state and y t is the observation at time t, respectively. Models of this form 
arise in many different application scenarios, either from direct modeling or 
via a transformation or marginalization of a larger model. We provide several 
examples in Section [5j 

To tackle the challenging problem of inference for non-Markovian SSMs, we 
work within the framework of particle MCMC (PMCMC), a family of inferential 
methods introduced in [2] . The basic idea in PMCMC is to use SMC to construct 
a proposal kernel for an MCMC sampler. Assume that we observe a sequence 
of measurements yi-.r- We are interested in finding the density p{x\-T, \ yi-.r), 
i.e., the joint posterior density of the state sequence and the parameter. In an 
idealized Gibbs sampler we would target this density by sampling as follows: (i) 
Draw 9* \ x\-t ~ p{9 | x\-t,Vi:t)\ (H) Draw x*. T | 9* ~ p{xi-T \ 9*,y 1: r)- The 
first step of this procedure can be carried out exactly if conjugate priors are used. 
For non-conjugate models, one option is to replace Step (i) with a Metropolis- 
Hastings step. However, Step (ii) — sampling from the joint smoothing density 
p{x\;T | 9, yi-.r) — is in most cases very difficult. In PMCMC, this is addressed by 
instead sampling a particle trajectory x\. T based on an SMC approximation of 
the joint smoothing density. More precisely, we run an SMC sampler targeting 
p{x\;T | 9*,yi;T)- We then sample one of the particles at the final time T, 
according to their importance weights, and trace the ancestral lineage of this 
particle to obtain the trajectory x\. T . This overall procedure is referred to as 
particle Gibbs (PG). 

The flexibility provided by the use of SMC as a proposal mechanism for 
MCMC seems promising for tackling inference in non-Markovian models. To 
exploit this flexibility we must address a drawback of PG in the high-dimensional 
setting, which is that the mixing of the PG kernel can be very poor when there 
is path degeneracy in the SMC sampler [3j[4] . This problem has been addressed 
in the generic setting of SSMs by adding a backward simulation step to the 
PG sampler, yielding a method denoted PG with backward simulation (PG- 
BS). It has been found that this considerably improves mixing, making the 
method much more robust to a small number of particles as well as larger data 
records (3j|3j. 

Unfortunately, however, the application of backward simulation is problem- 
atic for non-Markovian models. The reason is that we need to consider full state 
trajectories during the backward simulation pass, leading to 0(T 2 ) computa- 
tional complexity (see Section [4] for details). To address this issue, we develop 
a novel PMCMC method which we refer to as particle Gibbs with ancestor 
sampling (PG-AS) that achieves the effect of backward sampling without an 
explicit backward pass. As part of our development, we also develop a trunca- 
tion method geared to non-Markovian models. This method is a generic method 
that is also applicable to PG-BS, but, as we show in a simulation study in Sec- 
tion [6j the effect of the truncation error is much less severe for PG-AS than for 
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PG-BS. Indeed, we obtain up to an order of magnitude increase in accuracy in 
using PG-AS when compared to PG-BS in this study. 

Since we assume that it is straightforward to sample the parameter 6 of 
the idealized Gibbs sampler, we will not explicitly include sampling of 9 in the 
subsequent sections to simplify our presentation. 

This report is a slightly extended version of the paper [lj . 



2 Sequential Monte Carlo 

We first review the standard auxiliary SMC sampler, see e.g. [5j[6] . Let Jt(xv.t) 
for t = 1, . . . , T be a sequence of unnormalized densities on X', which we assume 
can be evaluated pointwisc in linear time. Let 7t(xi : t) be the corresponding nor- 
malized probability densities. For an SSM we would typically have Jt(%i:t) — 
p(xi-.t I Vi-.t) and 7 t (a: 1:t ) = p{x ut ,y ut ). Assume that {x™ t -\i w t^\}m=i is a 
weighted particle system targeting Jt-i( x i-.t-i)- This particle system is propa- 
gated to time t by sampling independently from a proposal kernel, 

at at 

M t (a t ,x t ) = '- 1 R t (x t 1 aff t _ x ). (2) 

In this formulation, the resampling step is implicit and corresponds to sampling 
the ancestor indices at- Note that a™ is the index of the ancestor particle 
of x™. When we write x™ t we refer to the ancestral path of x™. The factors 
v" 1 = ft(x™t)i known as adjustment multiplier weights, are used in the auxiliary 
SMC sampler to increase the probability of sampling ancestors that better can 
describe the current observation [6j. The particles are then weighted according 
to u>™ = Wt(x™ t ), where the weight function is given by 

W t {x 1:t ) = r r— r, (3) 

ryt-l{Xl:t-l)Vt-l{Xl:t-l)Rt(Xt \ Xl:t-l) 

for t > 2. The procedure is initiated by sampling from a proposal density 
x™ ~ Ri(xi) and assigning importance weights W™ = Wi(x™) with Wi(a;i) = 
llipx) / Rx{xx) ■ In PMCMC it is instructive to view this sampling procedure as 
a way of generating a single sample from the density 

N T N 

<Kx 1:T ,a 2:T ) 4 n iMon n m ^<^t) ^ 

m— 1 t—2 m—1 

on the space X NT x {1, . . . , A^}^ 7 ^ 1 ). Here we have introduced the boldface 
notation x t = {x\ : . . . , x^} and similarly for the ancestor indices. 



3 Particle Gibbs with ancestor sampling 

PMCMC methods is a class of MCMC samplers in which SMC is used to con- 
struct proposal kernels (2). The validity of these methods can be assessed by 
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viewing them as MCMC samplers on an extended state space in which all the 
random variables generated by the SMC sampler are seen as auxiliary variables. 
The target density on this extended space is given by 

By construction, this density admits ^It{x\. t ) as a marginal, and can thus be 
used as a surrogate for the original target density 7t [2]. Here A: is a variable 
indexing one of the particles at the final time point and b\ : x corresponds to the 
ancestral path of this particle: x\. T — = {Xi, . . . , x^}. These indices 

are given recursively from the ancestor indices by &t = k and b t — a t iY ■ The 
PG sampler [2| is a Gibbs sampler targeting cf> using the following sweep (note 
that bi-T = {a 2 2 T ,br}), 

1. Draw x*; 2 ; fcl:r ,a2 : V b2:T ~ (j){->q b ^ T ,aq b ^ T \ x\]£ ,6 1:T ). 

2. Draw k* ~ | x 1 l^ bl:T , a*^ &2:T , x^f , a b 2 2 Jf). 

Here we have introduced the notation x^~ TO = {xj, . . . , a;™ -1 , X™ , . . . , 2^}, 
x i-t' T = { x i" bl j ■••) x t 6t } an d similarly for the ancestor indices. In [2], a 
sequential procedure for sampling from the conditional density appearing in 
Step 1 is given. This method is known as conditional SMC (CSMC). It takes the 
form of an SMC sampler in which we condition on the event that a prespecified 
path with indices &i : t, is maintained throughout the sampler (see 

Algorithm [l] for a related procedure). Furthermore, the conditional distribution 
appearing in Step 2 of the PG sampler is shown to be proportional to Wj*, and 
it can thus straightforwardly be sampled from. 

Note that we never sample new values for the variables {x]]ip , in this 

sweep. Hence, the PG sampler is an "incomplete" Gibbs sampler, since it does 
not loop over all the variables of the model. It still holds that the PG sampler 
is ergodic, which intuitively can be explained by the fact that the collection of 
variables that is left out is chosen randomly at each iteration. However, it has 
been observed that the PG sampler can have very poor mixing, especially when 
N is small and/or T is large [3||4]. The reason for this poor mixing is that the 
SMC path degeneracy causes the collections of variables that are left out at any 
two consecutive iterations to be strongly dependent. 

We now turn to our new procedure, PG-AS, which aims to address this fun- 
damental issue. Our idea is to sample new values for the ancestor indices &i : t-i 
as part of the CSMC procedure^] By adding these variables to the Gibbs sweep, 
we can considerably improve the mixing of the PG kernel. The CSMC method is 
a sequential procedure to sample from <f>( 

i]t ) &1:t) by sampling 

according to {x*'A a*- 6 '} ~ 0(x t "\ | x^"- 1 ,^"- 1 ,^,^^, 

1 Ideally, we would like to include the variables as well, but this is in general not 

possible since it would be similar to sampling from the original target density (which we 
assume is infeasible). 
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for t = 1, . . . , T. After having sampled these variables at time t, we add a 
step in which we generate a new value for bt-\{— a*'), resulting in the following 
sweep: 

1'. ( CSMC with ancestor sampling) For t = 1, . . . , T, draw 

(a*' 6 ' =) bU ~ Hh-i I x^^a*^,^ At). 

2'. Draw (Jfe* =) 6^ - <j){b T | x*' T bl:T , a* :T , a^f). 

It can be verified that this corresponds to a partially collapsed Gibbs sam- 
pler [7] and will thus leave tf> invariant. To determine the conditional densities 
from which the ancestor indices are drawn, consider the following factorization, 
following directly from ([3]), 

Jt(Xl:t) = Wt{xi;t)vt-l{x\:t-l)Rt{xt I Xl:t-l)lt-l(%l:t-l) 




iix^OnMt^.xJ-). (6) 



Furthermore, we have 



h I xi :t , a 2: t, a;^ 1 .^ M+v.t) oc 0(xi :f , a 2:t , aJ^i.T , &t : r) 

7 T (xf. T )^(x 1:t ,a 2:t ) 7t(a&) 7t(^i:t) 



oc ^1=^ 11 (7) 



By plugging (|6| into the numerator we get, 



d>(bt 1 x 1:t ,a 2:t ,x^,6 t+ i:T) « ^'^4rr. (8) 



Hence, to sample a new ancestor index for the conditioned path at time t+1, wc 



proceed as follows. Given x' t+l . T {— a^^r) we compute the backward sampling 



weights, 

m m 7r({3ff f ,s; +1;T }) f , 



for m = 1, . . . , N . We then set b t — m with probability proportional to w^ T . 

It follows that the proposed CSMC with ancestor sampling (Step 1'), condi- 
tioned on {x' 1:T , four}, can be realized as in Algorithm]!] The difference between 
this algorithm and the CSMC sampler derived in [2] lies in the ancestor sam- 
pling step 2(b) (where instead, they set <z t ' = &t+i)- By introducing the ancestor 
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Algorithm 1 CSMC with ancestor sampling, conditioned on {x' x , T , b\ x x} 



1. Initialize (t = 1): 

(a) Draw a;™ ~ Ri{x\) for m ^ b\ and set xj 1 = x' x . 

(b) Set iuf = W x {xf) for m = 1, . . . , N. 

2. for t = 2, . . . , T: 

(a) Draw {a™,x™} ~ M t (a t ,x t ) for m ^ b t and set = a^. 

(b) Draw a^ 1 * with P{a\ t — m) cx u>J™ 

(c) Set x% = {x a J t _ 1 :xf } and w? = W t (x^ t ) for m = 1, . . . , 2V. 



sampling, we break the strong dependence between the generated particle tra- 
jectories and the path on which we condition. We call the resulting method, 
defined by Steps V and 2' above, PG with ancestor sampling (PG-AS). 

The idea of including the variables &i : t-i m the PG sampler has previously 
been suggested by Whiteley [8j and further explored in [3]|4j . This previous work, 
however, accomplishes this with a explicit backward simulation pass, which, as 
we discuss in the following section, is problematic for our applications to non- 
Markovian SSMs. In the PG-AS sampler, instead of requiring distinct forward 
and backward sequences of Gibbs steps as in PG with backward simulation 
(PG-BS), we obtain a similar effect via a single forward sweep. 



4 Truncation for non-Markovian models 

We return to the problem of inference in non-Markovian SSMs of the form shown 
in ([lj. To employ backward sampling, we need to evaluate the ratio 

Jt(xi :T ) p(xi:T,yi:T) T~T / i \ | \ / 1n N 
7 T = ^ T= II 9{Vs \Xl: S )f{X s (10 

In general, the computational cost of computing the backward sampling weights 
will thus be 0(T). This implies that the cost of generating a full backward 
trajectory is 0(T 2 ). It is therefore computationally prohibitive to employ back- 
ward simulation type of particle smoothers, as well as the PG samplers discussed 
above, for general non-Markovian models. 

To make progress, we consider non-Markovian models in which there is a 
decay in the influence of the past on the present, akin to that in Markovian 
models but without the strong Markovian assumption. Hence, it is possible 



to obtain a useful approximation when the product in (10 1 is truncated to a 
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smaller number of factors, say p. We then replace ^ with the approximation, 

~p,m m 7t+p({ ;c l7tJ x t+l:t+p}) f11l 
<T = W * — ■ ( n ) 



The following proposition formalizes our assumption. 

Proposition 1. Let P and P p be the probability distributions on {1, . . . , N}, 

defined by the backward sampling weight Q and the truncated backward sam- 
pling weights ([IT]), respectively. Let h s (k) = g(y t+s | x\. t ,x' t+1 . t+s )f(x' t _ 



x\. tl x' t+1 . t _ s ) and assume that max^j (h s (k)/h s (l) — 1) < Aexp(— cs), for some 
constants A and c > 0. Then, D^io(P\\P p ) < Cexp(— cp) for some constant C, 
where Z?kld is the Kullback-Leibler divergence (KLD). 

Proof. See Appendix [A"] □ 

From ( [IT] ), we see that we can compute the backward weights in constant 
time under the truncation within the PG-AS framework. The resulting approx- 
imation can be quite useful; indeed, in our experiments we have seen that even 
p = 1 can lead to very accurate inferential results. In general, however, it will 
not be known a priori how to set the truncation level p for any given problem. 
To address this problem, we propose to use an adaption of the truncation level. 



Since the approximative weights (111 can be evaluated sequentially, the idea is 
to start with p = 1 and then increase p until the weights have, in some sense, 
converged. In particular, in our experimental work, we have used the following 
simple approach. 



Let P p be the discrete probability measure defined by (111. Let e p — 
Dtv(P p , Pp-i) be the total variation (TV) distance between the distributions 
for two consecutive truncation levels. We then compute the exponentially de- 
caying moving average of the sequence e p , with forgetting factor 7 e [0, 1], and 
stop when this falls below some threshold r <G [0, 1]. This adaption scheme 
removes the requirement to specify p directly, but instead introduces the design 
parameters 7 and r. However, these parameters are much easier to reason about 
- a small value for 7 gives a rapid response to changes in e p whereas a large 
value gives a more conservative stopping rule, improving the accuracy of the 
approximation at the cost of higher computational complexity. A similar trade 
off holds for the threshold r as well. Most importantly, we have found that 
the same values for 7 and r can be used for a wide range of models, with very 
different mixing properties. 

To illustrate the effect of the adaption rule, and how the distribution P p 
typically evolves as we increase p, we provide two examples in Figure [TJ These 
examples are taken from the simulation study provided in Section |6.2| Note 
that the untruncated distribution P is given for the maximal value of p, i.e., 
furthest to the right in the figures. By using the adaptive truncation, we can 
stop the evaluation of the weights at a much earlier stage, and still obtain an 
accurate approximation of P. 
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Figure 1: Probability under P p as a function of the truncation level p for two 
different systems; one 5 dimensional (left) and one 20 dimensional (right). The 
N — 5 dotted lines correspond to P p (m) for to e {1, ... , N}, respectively (N.B. 
two of the lines overlap in the left figure). The dashed vertical lines show the 
value of the truncation level p a dpt., resulting from the adaption scheme with 
7 = 0.1 and r = 10~ 2 . See Section 6.2 for details on the experiments. 



5 Application areas 

In this section we present examples of problem classes involving non-Markovian 
SSMs for which the proposed PG-AS sampler can be applied. Numerical illus- 
trations are provided in Section [6] 



5.1 Rao-Blackwellized particle smoothing 

One popular approach to increase the efficiency of SMC samplers for SSMs is 
to marginalize over one component of the state, and apply an SMC sampler in 
the lower-dimensional marginal space. This leads to what is known as the Rao- 
Blackwellized particle filter (RBPF) [9-11 . The same approach has also been 



applied to state smoothing [12|[13], but it turns out that Rao-Blackwellization 
is less straightforward in this case, since the marginal state-process will be 
non-Markovian. As an example, a mixed linear/nonlinear Gaussian SSM (see, 
e.g., [IT]) with "nonlinear state" x t and "conditionally linear state" z t , can be 
reduced to 

X t ~ p(x t | Xl:t-l,!/l:t-l), Vt ~ p(Vt I Xl:t, Vl-.t-l)- (12) 

These conditional densities are Gaussian and can be evaluated for any fixed 
marginal state trajectory x\-.t-i by running a conditional Kalman filter to marginal- 
ize the Zt-process. 

In order to apply a backward-simulation-based method (e.g., a particle smoother) 
for this model, we need to evaluate the backward sampling weights Q. In a 
straightforward implementation, we thus need to run N Kalman filters for T — t 
time steps, for each t = 1, . . . , T — 1. The computational complexity of this 
calculation can be reduced by employing the truncation proposed in Section ^ ! 



2 For the specific problem of Rao-Blackwellized smoothing in conditionally Gaussian mod- 
els, a backward simulator which can be implemented in O(T) computational complexity has 
recently been proposed in [12] . This is based on the idea of propagating information backward 
in time as the backward samples are generated. 
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5.2 Particle smoothing for degenerate state-space models 

Many dynamical systems are most naturally modelled as degenerate in the sense 
that the transition kernel of the state process does not admit any dominating 
measure. For instance, consider a nonlinear system with additive noise of the 
form, 

C* = /(6-i) + Gw t _i, Vt=g(£t)+et, (13) 

where G is a tall matrix, and consequently rank(G) < dim(£ t ). That is, the 
process noise covariance matrix is singular. SMC samplers can straightforwardly 
be applied to this type of models, but it is more problematic to address the 
smoothing problem using particle methods. The reason is that the backward 
kernel also will be degenerate and it cannot be approximated in a natural way 
by the forward filter particles, as is normally done in backward-simulation-based 
particle smoothers. 

A possible remedy for this issue is to recast the degenerate SSM as a non- 
Markovian model in a lower-dimensional space. Let G = U [E 0] V T with 
unitary U and V be a singular value decomposition of G and let, 



^ u T tt = u T f(uu T tt-i) 



EV T wt-i 




(14) 



For simplicity we assume that Z\ is known. If this is not the case, it can be 
included in the system state or seen as a static parameter of the model. Hence, 
the sequence zi-t is <r(xi :t _i)-measurable and we can write Zt = z t (xi-t-i). With 
v t = TiV T Wt and by appropriate definitions of the functions f x and h, the model 



( 13 ) can thus be rewritten as, 

Xt = fx(Xl:t-l) + «t-l) Vt = h(xi: t ) + e t , (15) 

which is a non-degenerate, non-Markovian SSM. By exploiting the truncation 
proposed in Section|4]we can thus apply PG-AS to do inference in this model. In 
fact, this is nothing but another application of Rao-Blackwellization as discussed 
in Section |5.1| where the Zj-state is conditionally deterministic and thus trivially 
marginalizable . 

5.3 Additional problem classes 

There are many more problem classes in which non-Markovian models arise 
and in which backward-simulation-based methods can be of interest. For in- 
stance, the Dirichlet process mixture model (DPMM, see, e.g., [14]) is a pop- 
ular nonparametric Bayesian model for mixtures with an unknown number of 
components. Using a Polya urn representation, the mixture labels are given 
by a non-Markovian stochastic process, and the DPMM can thus be seen as 
a non-Markovian SSM. SMC has previously been used for inference in DP- 



MMs 15 16 . An interesting venue for future work is to use the PG-AS sampler 



for these models. A second example in Bayesian nonparametrics is Gaussian 
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process (GP) regression and classification (see, e.g., [l7]). The sample path of 
the GP can be seen as the state-process in a non-Markovian SSM. We can thus 
employ PMCMC, and in particular PG-AS, to address these inference problems. 

An application in genetics, for which SMC has been been successfully ap- 
plied, is reconstruction of phylogenetic trees 18 . A phylogenetic tree is a binary 
tree with observation at the leaf nodes. SMC is used to construct the tree in 
a bottom up fashion. A similar approach has also been used for Bayesian ag- 
glomerative clustering, in which SMC is used to construct a binary clustering 
tree based on Kingman's coalescent 19 . The generative models for the trees 



used in 18 , 19 are in fact Markovian, but the observations give rise to a con- 



ditional dependence which destroys the Markov property. To employ backward 
simulation to these models, we are thus faced with problems of a similar nature 
as those discussed in Section [H 



6 Numerical evaluation 

This section contains a numerical evaluation of the proposed method. First, we 
consider linear Gaussian systems, which is instructive since the exact smooth- 
ing density then is available, e.g., by running a modified Bryson-Frazicr (MBF) 



smoother 20 . Second, we apply the proposed method for joint state and pa- 



rameter inference in a target tracking scenario. 



6.1 RBPS: Linear Gaussian state-space model 

As a first example, we consider Rao-Blackwellized particle smoothing (RBPS) in 
a single-output 4th-order linear Gaussian SSM. The system has poles in —0.65, 
—0.12 and 0.22 ± O.lOi and is excited by white Gaussian noise with variance 
O.I/4. The scalar output y t is observed in white Gaussian noise with variance 
0.1. We generate T = 100 samples from the system and run PG-AS and PG- 
BS, marginalizing three out of the four states using an RBPF, i.e., dim(x t ) = 1. 
Both methods are run for R — 10000 iterations using N = 5 particles. The 
truncation level is set to p — 1, leading to a coarse approximation. The total 
computational complexity for each sampler is O(RNTp). We discard the first 
1000 iterations and then compute running means of the state trajectory x-ifp. 
From these, we then compute the running root mean squared errors (RMSEs) e r 
relative to the true posterior means (computed with an MBF smoother). Hence, 
if no approximation would have been made, we would expect e r — > 0, so any 
static error can be seen as the effect of the truncation. The results for five 
independent runs from both PG samplers are shown in Figure [2j First, we note 
that both methods give accurate results. Still, the error for PG-AS is close to 
an order of magnitude less than for PG-BS. Furthermore, it appears as if the 
error for PG-AS would decrease further, given more iterations, suggesting that 
the bias caused by the truncation is dominated by the Monte Carlo variance, 
even after R = 10000 iterations. 

For further comparison, we also run an untruncated forward filter/backward 
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o PG w. backward simulation 




1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 



Figure 2: Rao-Blackwellized state smoothing using PG. Running RMSEs for five 
independent runs of PG-AS (•) and PG-BS (o), respectively. The truncation 
level is set to p = 1. The solid line corresponds to a run of an untruncated 
FF-BS. 



simulator (FF-BS) particle smoother [21] , using N — 5000 forward filter parti- 
cles and M = 500 backward trajectories (with a computational complexity of 
0{NMT 2 )). The resulting RMSE value is shown as a solid line in Figure [5J 
These results suggest that PMCMC samplers, such as the PG-AS, indeed can 
be serious competitors to more "standard" particle smoothers. Even with p = 1, 
PG-AS outperforms FF-BS in terms of accuracy and, due to the fact that the 
ancestor sampling allows us to use as few as N = 5 particles at each iteration, 
at a lower computational cost. 

6.2 Random linear Gaussian systems with rank deficient 
process noise covariances 

To see how the PG samplers are affected by the choice of truncation level p 
and by the mixing properties of the system, wc evaluate them on random linear 
Gaussian SSMs of different orders. Wc generate 150 random systems, using the 
MATLAB function drss from the Control Systems Toolbox, with model orders 
2, 5 and 20 (50 systems for each model order). The number of outputs are taken 
as 1, 2 and 4 for the different model orders, respectively. The systems are then 
simulated for T — 200 time steps, driven by Gaussian process noise entering only 
on the first state component. Hence, the rank of the process noise covariance is 
1 for all systems. The process noise and measurement noise variances are both 
set to 0.1. 

We run the PG-AS and PG-BS samplers for 10000 iterations using N = 5 
particles. We consider different fixed truncation levels, (p = 1, 2 and 3 for 2nd 
order systems and p = 1, 5 and 10 for 5th and 20th order systems), as well as an 
adaptive level with 7 = 0.1 and r = 10~ 2 . Again, we compute running posterior 
means (discarding 1000 samples) and RMSE values relative the true posterior 
mean. Box plots are shown in Figure [3] Since the process noise only enters on 
one of the state components, the mixing tends to deteriorate as we increase the 
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d = 2 

iijijiji 

p = l P = 2 p = 3 Adapt. (3.8) P = l P = 5 p = 10 Adapt. (5.9) 

d = 20 

+* ++ 4t 

■ PG w. ancestral sampling 
PG w. backward simulation 

p = 1 P = 5 p = 10 Adapt. (10.6) 

Figure 3: Box plots of the RMSE errors for PG-AS (black) and PG-BS (gray), 
for 150 random systems of different dimensions d (upper left, d = 2; upper 
right, d = 5; bottom, d — 20). Different values for the truncation level p are 
considered. The rightmost boxes correspond to an adaptive threshold and the 
values in parentheses are the average over all systems and MCMC iterations (the 
same for both methods). The dots within the boxes show the median errors. 

model order. Figure [l] shows how the probability distributions on {1, . . . , N} 
change as we increase the truncation level, in two representative cases for a 5th 
and a 20th order system, respectively. By using an adapted level, we can obtain 
accurate results for systems of different dimensions, without having to change 
any settings between the runs. 
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6.3 Range-bearing tracking in model with rank deficient 
process noise covariance 

Target tracking is an area in which SMC methods have been applied with great 



success, see e.g. 22-24 . Tracking is most commonly seen as an online filtering 
problem, though in certain scenarios it might be beneficial to instead view it as 
a smoothing problem. For instance, if a target tracker in a surveillance system 
detects some abnormal behaviour, it can be interesting to apply a smoother to 
obtain refined estimates of the target's position prior to the detection. 

Here, we consider smoothing in a range-bearing target tracking scenario. 
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The system state consists of the target's position and velocity in two dimensions, 
£f = (pf p\ vf vf\ . We use a coordinated turn (CT) model, which is a 
standard model for a manoeuvring target (see e.g. |23|), 
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The turn rate is given by 
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(16a) 



(16b) 



which depends nonlinearly on the system state. The parameter 9 is the ma- 
noeuvre acceleration, which we assume is fixed but unknown. This is done to 
illustrate the fact that PG-AS straightforwardly can be used for joint param- 
eter and state inference, as pointed out in Section [T] The system is assumed 
to be affected by a random acceleration ui t ~ Af(0, Q) (the process noise), here 
with Q = 10/2- This is a common assumption for many models used in target 
tracking. The matrix G arises from a time discretization of a continuous time 
model, where T s = 0.1 is the sampling time. The initial state of the system is 
given a Gaussian prior, M ((500 500 0) T ,diag((20 20 5 5) T )). 

We assume that the range and bearing of the target can be observed, so that 
the measurements are given by, 



(V(pt) 2 + (p y t y 

' i \ arctan(p£/pf) 



+ e t , 



0, 



50 
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(17) 



This choice of measurement noise covariance corresponds to an accurate bearing 
measurement, but an uninformative range measurement. Such a measurement 
could for instance arise in visual tracking, where the range is estimated based 
on the size of the target. 

We initialize the system as £1 = (490 490 5) and simulate it for 
T = 200 time steps. The true target trajectory is shown in Figure |4j Note that 
the process noise covariance GQG T is singular, which implies that care needs 
to be taken when designing a smoothing algorithm for this model. Here, we 
apply a linear state transformation, as suggested in Section |5.2| to reduce the 
model to a lower-dimensional state-space. With G = U [E 0] V T we define 

\xj zJ] T — U T ^t- We then employ the PG-AS sampler for joint parameter 
and state inference, by targeting the density p{9,zi,x\-T \ Di-.t)- We apply a 
Metropolis-Hastings step to update 6, using a Gaussian random walk proposal 
with standard deviation a = 0.2 and target density p(8\zi,xi-,T,yi:T)- The 
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480 



460 480 500 

Figure 4: Target trajectory in the horizontal plane (black solid line) and 
smoothed posterior means for PG-AS (dashed line) and PMMH (*) under pa- 
rameter uncertainty. The gray line is the PF estimate and the dots show the 
range-bearing measurements, transformed to Cartesian coordinates. 



initial state of the system is unknown, so the variable z\ is seen as a part of 
the system state. That is, the SMC sampler targets the sequence of densities 
pe{zi,xi:t | yi-.t) for t = 1, T. 

It is worth to point out that this SMC sampler is not more complicated to 



implement than a sampler targeting the original model (16). In fact, a natural 
way to do the implementation is to run the sampler as if targeting \xj zJ~\ T 
jointljj^] The difference is that the Zt-particles are seen as conditional sufficient 
statistics for the z t -state (which is conditionally deterministic), similarly to how 
one propagates the sufficient statistics for the conditionally linear state in an 
RBPF. The difference lies in how the backward sampling is done, where in the 
marginal model we only consider the X(-states when computing the backward 
weights. 

The PG-AS sampler was run with N = 5 particles for 50000 iterations, with 
the first 10000 samples discarded as burnin. We used an adaptive truncation 
level with 7 = 0.1 and r = 10~ 2 (same as before), resulting in an average 
truncation level of 2.3. As a comparison, we also employ a particle marginal 
Metropolis-Hastings (PMMH) sampler [2], with N — 5000 particles, also run- 
ning for 50000 MCMC iterations (discarding the first 10000 samples). The 
smoothed estimates of the target trajectory are shown in Figure [4] and the pos- 
terior density of 9 is given in Figure [5] From these results we see that the 
PG-AS sampler provides accurate inferential results, despite the truncation of 



3 For the results presented here, we used a standard bootstrap PF, which is very straight- 
forward to implement. 
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Figure 5: Histograms representing the posterior density p{9 \ yi-r) for the PG- 
AS sampler (gray bars) and for PMMH (*). The "true" value, used in the data 
generation, is 1. 



the backward weights and without any problem specific tuning of the variables 
7 and r. 



7 Discussion 

PG-AS is a novel approach to PMCMC that makes use of backward simulation 
ideas without needing an explicit backward pass. Compared to PG-BS, a con- 
ceptually similar method that does require an explicit backward pass, PG-AS 
has advantages, most notably for inference in the non-Markovian SSMs that 
have been our focus here. When using the proposed truncation of the backward 
weights, we have found PG-AS to be more robust to the approximation error 
than PG-BS. Furthermore, for non-Markovian models, PG-AS is easier to im- 
plement than PG-BS, since it requires less bookkeeping. It can also be more 
memory efficient, since it does not require us to store intermediate quantities 
that are needed for a separate backward simulation pass, as is done in PG-BS. 
Finally, we note that PG-AS can be used as an alternative to PG-BS for other 
inference problems to which PMCMC can be applied, and we believe that it 
will prove attractive in problems beyond the non-Markovian SSMs that we have 
discussed here. 
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A Proof of Proposition [T] 

With M = T — t and w(k) — Wf, the distributions of interest are given by 
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It follows that the KLD is bounded according to, 
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